# import required libraries
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
import warnings
warnings.filterwarnings("ignore")
from PIL import Image
i) The Joint Probability of the people who planned to purchase and actually placed an order from the give table is given by -
P (People who actually placed an order | Total number of people) = 400/2000 = 20% or 0.2
ii) The Joint Probability of the people who planned to purchase and actually placed an order, given that people planned to purchase is given by -
P (People who actually placed an order | Total number of people who planned to purchase) = 400/500 = 80% or 0.8
This is a Binomial Distribution Probability problem, since there are only two outcomes in this case, namely is a given bulb DEFECTIVE or NOT DEFECTIVE
n=10
p=.95
k=np.arange(0,11)
k
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
binomial = stats.binom.pmf(k,n,p)
binomial
array([9.76562500e-14, 1.85546875e-11, 1.58642578e-09, 8.03789063e-08,
2.67259863e-06, 6.09352488e-05, 9.64808106e-04, 1.04750594e-02,
7.46347985e-02, 3.15124705e-01, 5.98736939e-01])
n=len(binomial)
n
11
sum(binomial)
1.0000000000000004
A. Probability that none of the items are defective is-
float(binomial[10])
0.5987369392383787
B. Probability that exactly one of the items is defective is-
float(binomial[9])
0.3151247048623052
C. Probability that two or fewer of the items are defective is-
stats.binom.cdf(k=2, n=10, p=0.05)
0.9884964426207031
D. Probability that three or more of the items are defective is- 1-P(k<=2)
1-stats.binom.cdf(k=2, n=10, p=0.05)
0.01150355737929687
This is a Poisson Distribution Probability problem, since there no upper bound and average rate of cars sold is given
rate = 3
n=np.arange(0,20)
poisson = stats.poisson.pmf(n,rate)
poisson
array([4.97870684e-02, 1.49361205e-01, 2.24041808e-01, 2.24041808e-01,
1.68031356e-01, 1.00818813e-01, 5.04094067e-02, 2.16040315e-02,
8.10151179e-03, 2.70050393e-03, 8.10151179e-04, 2.20950322e-04,
5.52375804e-05, 1.27471339e-05, 2.73152870e-06, 5.46305740e-07,
1.02432326e-07, 1.80762929e-08, 3.01271548e-09, 4.75691918e-10])
A. Probability that in a given week he will sell some cars is-
1-poisson[0]
0.950212931632136
B. Probability that in a given week he will sell 2 or more but less than 5 cars is-
(poisson[2]+poisson[3]+poisson[4])
0.6161149710523164
C. Plot of the Poisson Distribution function for cumulative probability of cars sold per-week vs number of cars sold perweek-
plt.plot(n,poisson,'o-')
plt.title('Poisson: $\lambda$ = %i ' % rate)
plt.xlabel('Number of Cars sold per week')
plt.ylabel('Probability of Number of Cars sold per week')
plt.show()
n1=3
p1=.868
j =np.arange(0,4)
j
array([0, 1, 2, 3])
binomial1 = stats.binom.pmf(j,n1,p1)
binomial1
array([0.00229997, 0.0453721 , 0.2983559 , 0.65397203])
sum(binomial1)
1.0
A. Probability that all three orders will be recognised correctly is-
float(binomial1[3])
0.653972032
B. Probability that none of the three orders will be recognised correctly is-
float(binomial1[0])
0.002299968
C. Probability that at least two of the three orders will be recognised correctly is-
float(binomial1[2]) + float(binomial1[3])
0.9523279359999999
The pattern of marks follows a Normal Distribution
A. Percentage of students who score more than 80 is
z1=(80-60)/12
z1
1.6666666666666667
k=stats.norm.cdf(1.6666666666666667)
k
0.9522096477271853
B. Percentage of students who score less than 50 is
z2=(50-60)/12
z2
-0.8333333333333334
m=stats.norm.cdf(-0.8333333333333334)
m
0.20232838096364308
C. What should be the distinction mark if the highest 10% of students are to be awarded distinction
b=(norm.ppf(0.9)*12)+60
b
75.3786187865352
One real life industry scenario where we can use the concepts of Applied statistics to get a data driven business solution is-
Data Analytics as a means to Boost Customer Acquisition and Retention
Customer is the most important asset any business depends on. Tracking online customer activity and using data to predict the next buy and chance of buy uses concepts of Applied Statistics and data Analysiss allows businesses to observe various customer related patterns and trends. Observing customer behaviour is important for boosting sales. Understanding the customer insights allow businesses to be able to deliver what the customers want.
Example - E-commerce Companies that uses Big Data Analysis to predict customer behaviour and suggest recommendations
-------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX--------------------
The dataset contains information on all the men's top professional basketball teams of the American league system that have participated in all the past tournaments. It has data about how many baskets each team scored, conceded, how many times they came within the first 2 positions, how many tournaments they have qualified, their best position in the past, etc
Company’s management wants to invest on proposal on managing some of the best teams in the league. The analytics department has been assigned with a task of creating a report on the performance shown by the teams. Some of the older teams are already in contract with competitors. Hence Company X wants to understand which teams they can approach which will be a deal win for them.
data = pd.read_csv('C:/Users/Server/Desktop/APPLIED STATISTICS PROJECT/Basketball.csv')
data.shape
(61, 13)
data.dtypes.value_counts()
int64 11 object 2 dtype: int64
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 61 entries, 0 to 60 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Team 61 non-null object 1 Tournament 61 non-null int64 2 Score 61 non-null int64 3 PlayedGames 61 non-null int64 4 WonGames 61 non-null int64 5 DrawnGames 61 non-null int64 6 LostGames 61 non-null int64 7 BasketScored 61 non-null int64 8 BasketGiven 61 non-null int64 9 TournamentChampion 61 non-null int64 10 Runner-up 61 non-null int64 11 TeamLaunch 61 non-null object 12 HighestPositionHeld 61 non-null int64 dtypes: int64(11), object(2) memory usage: 6.3+ KB
data.head()
| Team | Tournament | Score | PlayedGames | WonGames | DrawnGames | LostGames | BasketScored | BasketGiven | TournamentChampion | Runner-up | TeamLaunch | HighestPositionHeld | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Team 1 | 86 | 4385 | 2762 | 1647 | 552 | 563 | 5947 | 3140 | 33 | 23 | 1929 | 1 |
| 1 | Team 2 | 86 | 4262 | 2762 | 1581 | 573 | 608 | 5900 | 3114 | 25 | 25 | 1929 | 1 |
| 2 | Team 3 | 80 | 3442 | 2614 | 1241 | 598 | 775 | 4534 | 3309 | 10 | 8 | 1929 | 1 |
| 3 | Team 4 | 82 | 3386 | 2664 | 1187 | 616 | 861 | 4398 | 3469 | 6 | 6 | 1931-32 | 1 |
| 4 | Team 5 | 86 | 3368 | 2762 | 1209 | 633 | 920 | 4631 | 3700 | 8 | 7 | 1929 | 1 |
dupes = data.duplicated()
sum(dupes)
0
There are NO duplicate values in the dataset
data.isnull().values.any()
False
There are no missing values in the dataset
pd.DataFrame( data.isnull().sum(), columns= ['Number of missing values'])
| Number of missing values | |
|---|---|
| Team | 0 |
| Tournament | 0 |
| Score | 0 |
| PlayedGames | 0 |
| WonGames | 0 |
| DrawnGames | 0 |
| LostGames | 0 |
| BasketScored | 0 |
| BasketGiven | 0 |
| TournamentChampion | 0 |
| Runner-up | 0 |
| TeamLaunch | 0 |
| HighestPositionHeld | 0 |
# Replace NULL values with the MEDIAN of the column because Median is less effected by outliers
data.loc[60,'Score'] = data.Score.median()
data.loc[60,'PlayedGames'] = data.PlayedGames.median()
data.loc[60,'WonGames'] = data.WonGames.median()
data.loc[60,'DrawnGames'] = data.DrawnGames.median()
data.loc[60,'LostGames'] = data.LostGames.median()
data.loc[60,'BasketScored'] = data.BasketScored.median()
data.loc[60,'BasketGiven'] = data.BasketGiven.median()
data
| Team | Tournament | Score | PlayedGames | WonGames | DrawnGames | LostGames | BasketScored | BasketGiven | TournamentChampion | Runner-up | TeamLaunch | HighestPositionHeld | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Team 1 | 86 | 4385 | 2762 | 1647 | 552 | 563 | 5947 | 3140 | 33 | 23 | 1929 | 1 |
| 1 | Team 2 | 86 | 4262 | 2762 | 1581 | 573 | 608 | 5900 | 3114 | 25 | 25 | 1929 | 1 |
| 2 | Team 3 | 80 | 3442 | 2614 | 1241 | 598 | 775 | 4534 | 3309 | 10 | 8 | 1929 | 1 |
| 3 | Team 4 | 82 | 3386 | 2664 | 1187 | 616 | 861 | 4398 | 3469 | 6 | 6 | 1931-32 | 1 |
| 4 | Team 5 | 86 | 3368 | 2762 | 1209 | 633 | 920 | 4631 | 3700 | 8 | 7 | 1929 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 56 | Team 57 | 1 | 34 | 38 | 8 | 10 | 20 | 38 | 66 | 0 | 0 | 2009-10 | 20 |
| 57 | Team 58 | 1 | 22 | 30 | 7 | 8 | 15 | 37 | 57 | 0 | 0 | 1956-57 | 16 |
| 58 | Team 59 | 1 | 19 | 30 | 7 | 5 | 18 | 51 | 85 | 0 | 0 | 1951~52 | 16 |
| 59 | Team 60 | 1 | 14 | 30 | 5 | 4 | 21 | 34 | 65 | 0 | 0 | 1955-56 | 15 |
| 60 | Team 61 | 1 | 375 | 423 | 123 | 95 | 197 | 430 | 632 | 0 | 0 | 2017-18 | 9 |
61 rows × 13 columns
data
| Team | Tournament | Score | PlayedGames | WonGames | DrawnGames | LostGames | BasketScored | BasketGiven | TournamentChampion | Runner-up | TeamLaunch | HighestPositionHeld | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Team 1 | 86 | 4385 | 2762 | 1647 | 552 | 563 | 5947 | 3140 | 33 | 23 | 1929 | 1 |
| 1 | Team 2 | 86 | 4262 | 2762 | 1581 | 573 | 608 | 5900 | 3114 | 25 | 25 | 1929 | 1 |
| 2 | Team 3 | 80 | 3442 | 2614 | 1241 | 598 | 775 | 4534 | 3309 | 10 | 8 | 1929 | 1 |
| 3 | Team 4 | 82 | 3386 | 2664 | 1187 | 616 | 861 | 4398 | 3469 | 6 | 6 | 1931-32 | 1 |
| 4 | Team 5 | 86 | 3368 | 2762 | 1209 | 633 | 920 | 4631 | 3700 | 8 | 7 | 1929 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 56 | Team 57 | 1 | 34 | 38 | 8 | 10 | 20 | 38 | 66 | 0 | 0 | 2009-10 | 20 |
| 57 | Team 58 | 1 | 22 | 30 | 7 | 8 | 15 | 37 | 57 | 0 | 0 | 1956-57 | 16 |
| 58 | Team 59 | 1 | 19 | 30 | 7 | 5 | 18 | 51 | 85 | 0 | 0 | 1951~52 | 16 |
| 59 | Team 60 | 1 | 14 | 30 | 5 | 4 | 21 | 34 | 65 | 0 | 0 | 1955-56 | 15 |
| 60 | Team 61 | 1 | 375 | 423 | 123 | 95 | 197 | 430 | 632 | 0 | 0 | 2017-18 | 9 |
61 rows × 13 columns
data.describe()
| Tournament | Score | PlayedGames | WonGames | DrawnGames | LostGames | BasketScored | BasketGiven | TournamentChampion | Runner-up | HighestPositionHeld | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 | 61.000000 |
| mean | 24.000000 | 907.573770 | 803.754098 | 305.983607 | 190.491803 | 306.983607 | 1147.393443 | 1150.590164 | 1.426230 | 1.409836 | 7.081967 |
| std | 26.827225 | 1130.943639 | 871.532896 | 405.762800 | 200.680561 | 292.394795 | 1502.315638 | 1156.178985 | 5.472535 | 4.540107 | 5.276663 |
| min | 1.000000 | 14.000000 | 30.000000 | 5.000000 | 4.000000 | 15.000000 | 34.000000 | 55.000000 | 0.000000 | 0.000000 | 1.000000 |
| 25% | 4.000000 | 107.000000 | 116.000000 | 35.000000 | 27.000000 | 63.000000 | 155.000000 | 241.000000 | 0.000000 | 0.000000 | 3.000000 |
| 50% | 12.000000 | 375.000000 | 423.000000 | 123.000000 | 95.000000 | 197.000000 | 430.000000 | 632.000000 | 0.000000 | 0.000000 | 6.000000 |
| 75% | 38.000000 | 1351.000000 | 1318.000000 | 426.000000 | 330.000000 | 563.000000 | 1642.000000 | 1951.000000 | 0.000000 | 0.000000 | 10.000000 |
| max | 86.000000 | 4385.000000 | 2762.000000 | 1647.000000 | 633.000000 | 1070.000000 | 5947.000000 | 3889.000000 | 33.000000 | 25.000000 | 20.000000 |
data.skew()
Tournament 1.217038 Score 1.600011 PlayedGames 1.147881 WonGames 1.812477 DrawnGames 1.009590 LostGames 0.902817 BasketScored 1.784347 BasketGiven 0.982324 TournamentChampion 4.777021 Runner-up 4.360643 HighestPositionHeld 0.817976 dtype: float64
data.cov()
| Tournament | Score | PlayedGames | WonGames | DrawnGames | LostGames | BasketScored | BasketGiven | TournamentChampion | Runner-up | HighestPositionHeld | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tournament | 719.700000 | 2.973738e+04 | 2.331498e+04 | 10565.066667 | 5319.850000 | 7434.300000 | 3.924865e+04 | 3.059685e+04 | 86.483333 | 78.666667 | -100.233333 |
| Score | 29737.383333 | 1.279034e+06 | 9.656931e+05 | 457627.559563 | 217482.579781 | 290694.926230 | 1.693337e+06 | 1.234347e+06 | 4427.318033 | 3929.177596 | -3998.947814 |
| PlayedGames | 23314.983333 | 9.656931e+05 | 7.595696e+05 | 342175.195902 | 173846.906284 | 243623.545902 | 1.270120e+06 | 9.976242e+05 | 2745.989891 | 2508.085792 | -3273.462842 |
| WonGames | 10565.066667 | 4.576276e+05 | 3.421752e+05 | 164643.449727 | 76512.924863 | 101062.716393 | 6.091624e+05 | 4.344181e+05 | 1672.440437 | 1470.440164 | -1389.031967 |
| DrawnGames | 5319.850000 | 2.174826e+05 | 1.738469e+05 | 76512.924863 | 40272.687432 | 57081.108197 | 2.841370e+05 | 2.302824e+05 | 553.753552 | 516.578415 | -763.474317 |
| LostGames | 7434.300000 | 2.906949e+05 | 2.436235e+05 | 101062.716393 | 57081.108197 | 85494.716393 | 3.769811e+05 | 3.330032e+05 | 520.223770 | 521.490164 | -1121.365301 |
| BasketScored | 39248.650000 | 1.693337e+06 | 1.270120e+06 | 609162.423224 | 284137.036612 | 376981.089891 | 2.256952e+06 | 1.617657e+06 | 6117.512842 | 5397.736066 | -5149.782787 |
| BasketGiven | 30596.850000 | 1.234347e+06 | 9.976242e+05 | 434418.109836 | 230282.404918 | 333003.243169 | 1.617657e+06 | 1.336750e+06 | 2989.760929 | 2805.487432 | -4441.549180 |
| TournamentChampion | 86.483333 | 4.427318e+03 | 2.745990e+03 | 1672.440437 | 553.753552 | 520.223770 | 6.117513e+03 | 2.989761e+03 | 29.948634 | 24.139071 | -8.818852 |
| Runner-up | 78.666667 | 3.929178e+03 | 2.508086e+03 | 1470.440164 | 516.578415 | 521.490164 | 5.397736e+03 | 2.805487e+03 | 24.139071 | 20.612568 | -8.634153 |
| HighestPositionHeld | -100.233333 | -3.998948e+03 | -3.273463e+03 | -1389.031967 | -763.474317 | -1121.365301 | -5.149783e+03 | -4.441549e+03 | -8.818852 | -8.634153 | 27.843169 |
data.corr()
| Tournament | Score | PlayedGames | WonGames | DrawnGames | LostGames | BasketScored | BasketGiven | TournamentChampion | Runner-up | HighestPositionHeld | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tournament | 1.000000 | 0.980135 | 0.997185 | 0.970564 | 0.988140 | 0.947752 | 0.973840 | 0.986452 | 0.589072 | 0.645876 | -0.708071 |
| Score | 0.980135 | 1.000000 | 0.979748 | 0.997238 | 0.958249 | 0.879077 | 0.996647 | 0.943998 | 0.715338 | 0.765235 | -0.670109 |
| PlayedGames | 0.997185 | 0.979748 | 1.000000 | 0.967593 | 0.993981 | 0.956017 | 0.970063 | 0.990052 | 0.575740 | 0.633859 | -0.711810 |
| WonGames | 0.970564 | 0.997238 | 0.967593 | 1.000000 | 0.939631 | 0.851822 | 0.999309 | 0.925999 | 0.753165 | 0.798195 | -0.648755 |
| DrawnGames | 0.988140 | 0.958249 | 0.993981 | 0.939631 | 1.000000 | 0.972786 | 0.942457 | 0.992500 | 0.504223 | 0.566976 | -0.720991 |
| LostGames | 0.947752 | 0.879077 | 0.956017 | 0.851822 | 0.972786 | 1.000000 | 0.858200 | 0.985040 | 0.325111 | 0.392835 | -0.726805 |
| BasketScored | 0.973840 | 0.996647 | 0.970063 | 0.999309 | 0.942457 | 0.858200 | 1.000000 | 0.931323 | 0.744090 | 0.791379 | -0.649633 |
| BasketGiven | 0.986452 | 0.943998 | 0.990052 | 0.925999 | 0.992500 | 0.985040 | 0.931323 | 1.000000 | 0.472523 | 0.534462 | -0.728031 |
| TournamentChampion | 0.589072 | 0.715338 | 0.575740 | 0.753165 | 0.504223 | 0.325111 | 0.744090 | 0.472523 | 1.000000 | 0.971552 | -0.305397 |
| Runner-up | 0.645876 | 0.765235 | 0.633859 | 0.798195 | 0.566976 | 0.392835 | 0.791379 | 0.534462 | 0.971552 | 1.000000 | -0.360408 |
| HighestPositionHeld | -0.708071 | -0.670109 | -0.711810 | -0.648755 | -0.720991 | -0.726805 | -0.649633 | -0.728031 | -0.305397 | -0.360408 | 1.000000 |
sns.pairplot(data, kind="reg")
plt.show()
print("data:",data.mean())
data: Tournament 24.000000 Score 907.573770 PlayedGames 803.754098 WonGames 305.983607 DrawnGames 190.491803 LostGames 306.983607 BasketScored 1147.393443 BasketGiven 1150.590164 TournamentChampion 1.426230 Runner-up 1.409836 HighestPositionHeld 7.081967 dtype: float64
print(data['Score'].mode())
0 375 dtype: int64
print("data:",data.median())
data: Tournament 12.0 Score 375.0 PlayedGames 423.0 WonGames 123.0 DrawnGames 95.0 LostGames 197.0 BasketScored 430.0 BasketGiven 632.0 TournamentChampion 0.0 Runner-up 0.0 HighestPositionHeld 6.0 dtype: float64
mean=data['Score'].mean()
median=data['Score'].median()
mode=data['Score'].mode()
print('Mean: ',mean,'\nMedian: ',median,'\nMode: ',mode)
plt.figure(figsize=(10,5)) # set the figure size
plt.hist(data['Score'],bins=100,color='lightblue') #Plot the histogram
plt.axvline(mean,color='green',label='Mean') # Draw lines on the plot for mean median and the two modes we have in GRE Score
plt.axvline(median,color='blue',label='Median')
plt.axvline(mode[0],color='red',label='Mode')
plt.xlabel('Score') # label the x-axis
plt.ylabel('Frequency') # label the y-axis
plt.legend() # Plot the legend
plt.show()
Mean: 907.5737704918033 Median: 375.0 Mode: 0 375 dtype: int64
This is Right Skewed Data.
fig,ax = plt.subplots(figsize=(10, 10))
sns.heatmap(data.corr(), ax=ax, annot=True, linewidths=0.05, fmt= '.2f',cmap="magma") # the color intensity is based on
plt.show()
Heat Map showing the strength of relationship between different variables. The lighter colours show strong relationship and darker colours show weaker realtionship.
Univariate analysis refer to the analysis of a single variable. The main purpose of univariate analysis is to summarize and find patterns in the data. The key point is that there is only one variable involved in the analysis.
sns.boxplot(x=data['Score']) # box plot
sns.set(rc={"figure.figsize":(7, 10)})
data.boxplot(column="Score",return_type='axes',figsize=(8,8))
plt.text(x=0.74, y=112.00, s=" ")
plt.text(x=0.8, y=107.00, s=" ")
plt.text(x=0.75, y=103.00, s=" ")
plt.text(x=0.9, y=92.00, s=" ")
plt.text(x=0.9, y=120.00, s=" ")
plt.text(x=0.7, y=107.5, s=" ", rotation=90, size=25)
Text(0.7, 107.5, ' ')
sns.set(style="darkgrid")
sns.histplot(data=data, x="Score", kde= "True")
sns.set(rc={"figure.figsize":(4, 4)})
plt.show()
From the above analysis, we can infer that about half of the teams have Score below 1000 and less than 5 teams have score greater than 4000. The Right Skewness of the distribution is also visible here in the box plot and histogram. There are visible outliers with respect to score and are above 3000.
Through bivariate analysis we try to analyze two variables simultaneously. As opposed to univariate analysis where we check the characteristics of a single variable, in bivariate analysis we try to determine if there is any relationship between two variables.
There are essentially 3 major scenarios that we will come accross when we perform bivariate analysis
sns.barplot(data=data, x="HighestPositionHeld", y="WonGames")
plt.title('Highest Position vs Games Won by a team')
plt.xlabel('Highest Position')
plt.ylabel('Games Won')
plt.show()
This graph shows the relation between games won by a team and the highest position it achieved in ranking. Visibly the top teams have won many games.
sns.barplot(data=data, x="TournamentChampion", y="WonGames")
plt.title('Games Won vs Championships')
plt.xlabel('No. of Championships')
plt.ylabel('Games Won')
plt.show()
sns.set(rc={"figure.figsize":(2, 2)})
This graph shows the relation between games won by a team and the number of championships. The team with most championships has won more than 1600 games.
sns.barplot(data=data, x="BasketScored", y="PlayedGames")
plt.title('Games played vs Goals scored')
plt.xlabel('Goals Scored')
plt.ylabel('Games Played')
plt.show()
sns.set(rc={"figure.figsize":(50, 20)})
Games played vs Goals scored
sns.histplot(data = data
,x = 'Score'
, y = 'WonGames'
,color = 'navy'
,kde = True
)
<AxesSubplot:xlabel='Score', ylabel='WonGames'>
This graph shows the relation between games won by a team and the score.
sns.histplot(data = data
,x = 'Team'
, y = 'BasketScored'
,color = 'navy'
,kde = True
)
<AxesSubplot:xlabel='Team', ylabel='BasketScored'>
Team one is the highest Basket scorer and also one of the oldest teams.
sns.barplot(data=data, x="Team", y="BasketGiven")
plt.title('Team vs Basket Given')
plt.xlabel('Baskets given')
plt.ylabel('Basket given')
plt.show()
Team 7 has given most baskets and team 56 has given least
data.groupby(by=['Team'])['WonGames'].sum().reset_index().sort_values(['WonGames']).tail(10).plot(x='Team',
y='WonGames',
kind='bar',
figsize=(15,5))
plt.show()
This bar graph shows the number of games won by a team. Team one has won most games.
sns.scatterplot(data['WonGames'], data['Score'])
sns.set(rc={"figure.figsize":(5, 15)})
A Scatter Plot showing the number of games won by a team with respect to score.
plt.figure(figsize=(10,5))
ax = sns.barplot(x='TeamLaunch', y='WonGames', data=data, palette='muted')
plt.xticks(rotation=45)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]),
[Text(0, 0, '1929'),
Text(1, 0, '1931-32'),
Text(2, 0, '1934-35'),
Text(3, 0, '1939-40'),
Text(4, 0, '1932-33'),
Text(5, 0, '1941to42'),
Text(6, 0, '1948-49'),
Text(7, 0, '1944-45'),
Text(8, 0, '1935-36'),
Text(9, 0, '1949-50'),
Text(10, 0, '1933-34'),
Text(11, 0, '1960-61'),
Text(12, 0, '1951-52'),
Text(13, 0, '1998-99'),
Text(14, 0, '1941-42'),
Text(15, 0, '1977-78'),
Text(16, 0, '1959-60'),
Text(17, 0, '2004-05'),
Text(18, 0, '1961-62'),
Text(19, 0, '1940-41'),
Text(20, 0, '1930-31'),
Text(21, 0, '1963-64'),
Text(22, 0, '1974-75'),
Text(23, 0, '1943-44'),
Text(24, 0, '1987-88'),
Text(25, 0, '1991_92'),
Text(26, 0, '2007-08'),
Text(27, 0, '1962-63'),
Text(28, 0, '1994-95'),
Text(29, 0, '1978-79'),
Text(30, 0, '1971-72'),
Text(31, 0, '1999-00'),
Text(32, 0, '2014-15'),
Text(33, 0, '1990-91'),
Text(34, 0, '1947-48'),
Text(35, 0, '1996-97'),
Text(36, 0, '1995-96'),
Text(37, 0, '1945-46'),
Text(38, 0, '1953-54'),
Text(39, 0, '1979-80'),
Text(40, 0, '1950-51'),
Text(41, 0, '2016_17'),
Text(42, 0, '2009-10'),
Text(43, 0, '1956-57'),
Text(44, 0, '1951~52'),
Text(45, 0, '1955-56'),
Text(46, 0, '2017-18')])
The older teams have won more games with respect to newer teams
sns.set(style="darkgrid")
sns.histplot(data=data, x="WonGames", y="TeamLaunch", kde= "True")
sns.set(rc={"figure.figsize":(50, 50)})
plt.show()
sns.set(style="darkgrid")
sns.histplot(data=data, x="Team", y="Score", kde= "True")
sns.set(rc={"figure.figsize":(200, 200)})
plt.show()
sns.pairplot(data)
<seaborn.axisgrid.PairGrid at 0x22aa04a4a90>
figure = plt.figure(figsize=(15,5))
ax = sns.scatterplot(x=data['WonGames'],y='Score', data=data, size = "Score")
#Installation step
#!pip install pandas-profiling
#or
import sys
!{sys.executable} -m pip install pandas-profiling
Requirement already satisfied: pandas-profiling in c:\users\server\anaconda3\lib\site-packages (2.13.0) Requirement already satisfied: matplotlib>=3.2.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (3.3.2) Requirement already satisfied: confuse>=1.0.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.4.0) Requirement already satisfied: attrs>=19.3.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (20.3.0) Requirement already satisfied: tangled-up-in-unicode>=0.0.6 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.0.7) Requirement already satisfied: visions[type_image_path]==0.7.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.7.1) Requirement already satisfied: joblib in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.17.0) Requirement already satisfied: jinja2>=2.11.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (2.11.2) Requirement already satisfied: tqdm>=4.48.2 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (4.50.2) Requirement already satisfied: requests>=2.24.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (2.24.0) Requirement already satisfied: seaborn>=0.10.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.11.0) Requirement already satisfied: phik>=0.11.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.11.2) Requirement already satisfied: scipy>=1.4.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.5.2) Requirement already satisfied: numpy>=1.16.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.19.2) Requirement already satisfied: htmlmin>=0.1.12 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.1.12) Requirement already satisfied: missingno>=0.4.2 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.4.2) Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.1.3) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (1.3.0) Requirement already satisfied: pillow>=6.2.0 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (8.0.1) Requirement already satisfied: cycler>=0.10 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (0.10.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2.4.7) Requirement already satisfied: certifi>=2020.06.20 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2020.6.20) Requirement already satisfied: python-dateutil>=2.1 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2.8.1) Requirement already satisfied: pyyaml in c:\users\server\anaconda3\lib\site-packages (from confuse>=1.0.0->pandas-profiling) (5.3.1) Requirement already satisfied: multimethod==1.4 in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (1.4) Requirement already satisfied: bottleneck in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (1.3.2) Requirement already satisfied: networkx>=2.4 in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (2.5) Requirement already satisfied: imagehash; extra == "type_image_path" in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (4.2.0) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\server\anaconda3\lib\site-packages (from jinja2>=2.11.1->pandas-profiling) (1.1.1) Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\server\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in c:\users\server\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\server\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (1.25.11) Requirement already satisfied: pytz>=2017.2 in c:\users\server\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (2020.1) Requirement already satisfied: six in c:\users\server\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=3.2.0->pandas-profiling) (1.15.0) Requirement already satisfied: decorator>=4.3.0 in c:\users\server\anaconda3\lib\site-packages (from networkx>=2.4->visions[type_image_path]==0.7.1->pandas-profiling) (4.4.2) Requirement already satisfied: PyWavelets in c:\users\server\anaconda3\lib\site-packages (from imagehash; extra == "type_image_path"->visions[type_image_path]==0.7.1->pandas-profiling) (1.1.1)
!pip install -U pandas-profiling
Collecting pandas-profiling
Downloading pandas_profiling-3.0.0-py2.py3-none-any.whl (248 kB)
Requirement already satisfied, skipping upgrade: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.1.3)
Requirement already satisfied, skipping upgrade: visions[type_image_path]==0.7.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.7.1)
Requirement already satisfied, skipping upgrade: htmlmin>=0.1.12 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.1.12)
Collecting pydantic>=1.8.1
Downloading pydantic-1.8.2-cp38-cp38-win_amd64.whl (2.0 MB)
Requirement already satisfied, skipping upgrade: phik>=0.11.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.11.2)
Requirement already satisfied, skipping upgrade: jinja2>=2.11.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (2.11.2)
Requirement already satisfied, skipping upgrade: matplotlib>=3.2.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (3.3.2)
Collecting tangled-up-in-unicode==0.1.0
Downloading tangled_up_in_unicode-0.1.0-py3-none-any.whl (3.1 MB)
Requirement already satisfied, skipping upgrade: tqdm>=4.48.2 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (4.50.2)
Requirement already satisfied, skipping upgrade: PyYAML>=5.0.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (5.3.1)
Requirement already satisfied, skipping upgrade: seaborn>=0.10.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.11.0)
Requirement already satisfied, skipping upgrade: numpy>=1.16.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.19.2)
Requirement already satisfied, skipping upgrade: joblib in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.17.0)
Requirement already satisfied, skipping upgrade: scipy>=1.4.1 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (1.5.2)
Requirement already satisfied, skipping upgrade: requests>=2.24.0 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (2.24.0)
Requirement already satisfied, skipping upgrade: missingno>=0.4.2 in c:\users\server\anaconda3\lib\site-packages (from pandas-profiling) (0.4.2)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in c:\users\server\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (2.8.1)
Requirement already satisfied, skipping upgrade: pytz>=2017.2 in c:\users\server\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (2020.1)
Requirement already satisfied, skipping upgrade: attrs>=19.3.0 in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (20.3.0)
Requirement already satisfied, skipping upgrade: networkx>=2.4 in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (2.5)
Requirement already satisfied, skipping upgrade: multimethod==1.4 in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (1.4)
Requirement already satisfied, skipping upgrade: bottleneck in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (1.3.2)
Requirement already satisfied, skipping upgrade: Pillow; extra == "type_image_path" in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (8.0.1)
Requirement already satisfied, skipping upgrade: imagehash; extra == "type_image_path" in c:\users\server\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.1->pandas-profiling) (4.2.0)
Requirement already satisfied, skipping upgrade: typing-extensions>=3.7.4.3 in c:\users\server\anaconda3\lib\site-packages (from pydantic>=1.8.1->pandas-profiling) (3.7.4.3)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in c:\users\server\anaconda3\lib\site-packages (from jinja2>=2.11.1->pandas-profiling) (1.1.1)
Requirement already satisfied, skipping upgrade: certifi>=2020.06.20 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2020.6.20)
Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (1.3.0)
Requirement already satisfied, skipping upgrade: cycler>=0.10 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (0.10.0)
Requirement already satisfied, skipping upgrade: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\server\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2.4.7)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in c:\users\server\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (2.10)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\server\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (1.25.11)
Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in c:\users\server\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (3.0.4)
Requirement already satisfied, skipping upgrade: six>=1.5 in c:\users\server\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (1.15.0)
Requirement already satisfied, skipping upgrade: decorator>=4.3.0 in c:\users\server\anaconda3\lib\site-packages (from networkx>=2.4->visions[type_image_path]==0.7.1->pandas-profiling) (4.4.2)
Requirement already satisfied, skipping upgrade: PyWavelets in c:\users\server\anaconda3\lib\site-packages (from imagehash; extra == "type_image_path"->visions[type_image_path]==0.7.1->pandas-profiling) (1.1.1)
Installing collected packages: pydantic, tangled-up-in-unicode, pandas-profiling
Attempting uninstall: tangled-up-in-unicode
Found existing installation: tangled-up-in-unicode 0.0.7
Uninstalling tangled-up-in-unicode-0.0.7:
Successfully uninstalled tangled-up-in-unicode-0.0.7
Attempting uninstall: pandas-profiling
Found existing installation: pandas-profiling 2.13.0
Uninstalling pandas-profiling-2.13.0:
Successfully uninstalled pandas-profiling-2.13.0
Successfully installed pandas-profiling-3.0.0 pydantic-1.8.2 tangled-up-in-unicode-0.1.0
#import pandas_profiling
import pandas_profiling
df = pd.read_csv('C:/Users/Server/Desktop/APPLIED STATISTICS PROJECT/Basketball.csv')
#Getting the pandas profiling report
pandas_profiling.ProfileReport(df)
---------------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-------------------------------
A company specifically reports on the business related to technology news, analysis of emerging trends and profiling of new tech businesses and products. Their event i.e. Startup Battlefield is the world’s pre-eminent startup competition. Startup Battlefield features 15-30 top early stage startups pitching top judges in front of a vast live audience, present in person and online.
To analyse the data of the various companies from the given dataset and perform the tasks that are specified in the below steps. Draw insights from the various attributes that are present in the dataset, plot distributions, state hypotheses and draw conclusions from the dataset.
%matplotlib inline
Data = pd.read_csv('C:/Users/Server/Desktop/APPLIED STATISTICS PROJECT/EU.csv')
Data.shape
(662, 6)
Data.dtypes.value_counts()
object 6 dtype: int64
Data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 662 entries, 0 to 661 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Startup 662 non-null object 1 Product 656 non-null object 2 Funding 448 non-null object 3 Event 662 non-null object 4 Result 662 non-null object 5 OperatingState 662 non-null object dtypes: object(6) memory usage: 31.2+ KB
Data.head()
| Startup | Product | Funding | Event | Result | OperatingState | |
|---|---|---|---|---|---|---|
| 0 | 2600Hz | 2600hz.com | NaN | Disrupt SF 2013 | Contestant | Operating |
| 1 | 3DLT | 3dlt.com | $630K | Disrupt NYC 2013 | Contestant | Closed |
| 2 | 3DPrinterOS | 3dprinteros.com | NaN | Disrupt SF 2016 | Contestant | Operating |
| 3 | 3Dprintler | 3dprintler.com | $1M | Disrupt NY 2016 | Audience choice | Operating |
| 4 | 42 Technologies | 42technologies.com | NaN | Disrupt NYC 2013 | Contestant | Operating |
Check the data for missing values
Data.isnull().sum().sum()
220
Dropping missing values
Data.dropna(inplace=True)
Data.isnull().sum()
Startup 0 Product 0 Funding 0 Event 0 Result 0 OperatingState 0 dtype: int64
Data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 446 entries, 1 to 661 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Startup 446 non-null object 1 Product 446 non-null object 2 Funding 446 non-null object 3 Event 446 non-null object 4 Result 446 non-null object 5 OperatingState 446 non-null object dtypes: object(6) memory usage: 24.4+ KB
Convert the ‘Funding’ features to a numerical value.
Replace the symbols of ',' and '$' with white space
Data['Funding'] = Data['Funding'].str.replace(',', '')
Data['Funding'] = Data['Funding'].str.replace('$', '')
Data
| Startup | Product | Funding | Event | Result | OperatingState | |
|---|---|---|---|---|---|---|
| 1 | 3DLT | 3dlt.com | 630K | Disrupt NYC 2013 | Contestant | Closed |
| 3 | 3Dprintler | 3dprintler.com | 1M | Disrupt NY 2016 | Audience choice | Operating |
| 5 | 5to1 | 5to1.com | 19.3M | TC50 2009 | Contestant | Acquired |
| 6 | 8 Securities | 8securities.com | 29M | Disrupt Beijing 2011 | Finalist | Operating |
| 10 | AdhereTech | adheretech.com | 1.8M | Hardware Battlefield 2014 | Contestant | Operating |
| ... | ... | ... | ... | ... | ... | ... |
| 657 | Zivity | zivity.com | 8M | TC40 2007 | Contestant | Operating |
| 658 | Zmorph | zmorph3d.com | 1M | - | Audience choice | Operating |
| 659 | Zocdoc | zocdoc.com | 223M | TC40 2007 | Contestant | Operating |
| 660 | Zula | zulaapp.com | 3.4M | Disrupt SF 2013 | Audience choice | Operating |
| 661 | Zumper | zumper.com | 31.5M | Disrupt SF 2012 | Finalist | Operating |
446 rows × 6 columns
Replace characters K and M with the corresponding numeric values
Data.Funding = (Data.Funding.replace(r'[KMB]+$', '', regex=True).astype(float) * \
....: Data.Funding.str.extract(r'[\d\.]+([KM]+)', expand=False)
....: .fillna(1)
....: .replace(['K','M','B'], [10**3, 10**6,10*9]).astype(int))
Data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 446 entries, 1 to 661 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Startup 446 non-null object 1 Product 446 non-null object 2 Funding 446 non-null float64 3 Event 446 non-null object 4 Result 446 non-null object 5 OperatingState 446 non-null object dtypes: float64(1), object(5) memory usage: 24.4+ KB
"Funding" changed to to a numerical value.
Data
| Startup | Product | Funding | Event | Result | OperatingState | |
|---|---|---|---|---|---|---|
| 1 | 3DLT | 3dlt.com | 630000.0 | Disrupt NYC 2013 | Contestant | Closed |
| 3 | 3Dprintler | 3dprintler.com | 1000000.0 | Disrupt NY 2016 | Audience choice | Operating |
| 5 | 5to1 | 5to1.com | 19300000.0 | TC50 2009 | Contestant | Acquired |
| 6 | 8 Securities | 8securities.com | 29000000.0 | Disrupt Beijing 2011 | Finalist | Operating |
| 10 | AdhereTech | adheretech.com | 1800000.0 | Hardware Battlefield 2014 | Contestant | Operating |
| ... | ... | ... | ... | ... | ... | ... |
| 657 | Zivity | zivity.com | 8000000.0 | TC40 2007 | Contestant | Operating |
| 658 | Zmorph | zmorph3d.com | 1000000.0 | - | Audience choice | Operating |
| 659 | Zocdoc | zocdoc.com | 223000000.0 | TC40 2007 | Contestant | Operating |
| 660 | Zula | zulaapp.com | 3400000.0 | Disrupt SF 2013 | Audience choice | Operating |
| 661 | Zumper | zumper.com | 31500000.0 | Disrupt SF 2012 | Finalist | Operating |
446 rows × 6 columns
sns.boxplot(x=Data['Funding'])
sns.set(rc={"figure.figsize":(4, 4)})
Data['OperatingState'].value_counts().plot(kind='bar')
<AxesSubplot:>
sns.set_style('whitegrid')
sns.displot(Data, x="Funding", bins=20)
sns.set(rc={"figure.figsize":(4, 4)})
sns.displot(Data, x="Funding", hue="OperatingState")
sns.set(rc={"figure.figsize":(10, 10)})
Data1 = Data.copy(deep=True)